Student ID: 2021AIML064

Student Name: Jagadish Yalla

K Nearest Neighbors Project

Welcome to the KNN Project! Go ahead and just follow the directions below.

Get the Data [1mark]
EDA [2marks]
Train Test Split [1marks]
Using KNN[1marks]
Predictions and evaluations[2marks]
Choosing the k value[1marks]
Retraining with new k-value[1mark]

Import Libraries

Import pandas,seaborn, and the usual libraries.

Get the Data

Read the 'KNN_Project_Data csv file into a dataframe **

Check the head of the dataframe.

EDA

Since this data is artificial, we'll just do a large pairplot with seaborn.

Use seaborn on the dataframe to create a pairplot with the hue indicated by the TARGET CLASS column.

Standardize the Variables

Time to standardize the variables.

Import StandardScaler from Scikit learn.

Create a StandardScaler() object called scaler.

Fit scaler to the features.

Use the .transform() method to transform the features to a scaled version.
Used both fit and transform in same line of code

Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.

Train Test Split

Use train_test_split to split your data into a training set and a testing set.

Using KNN

Import KNeighborsClassifier from scikit learn.

Create a KNN model instance with n_neighbors=1

Fit this KNN model to the training data.

Predictions and Evaluations

Let's evaluate our KNN model!

Use the predict method to predict values using your KNN model and X_test.

Create a confusion matrix and classification report.

Choosing a K Value

Let's go ahead and use the elbow method to pick a good K Value!

Create a for loop that trains various KNN models with different k values, then keep track of the error_rate for each of these models with a list. Refer to the lecture if you are confused on this step.

Now create the following plot using the information from your for loop.

Choose best K value using

Retrain with new K Value

Retrain your model with the best K value (up to you to decide what you want) and re-do the classification report and the confusion matrix.

From above plot at Neighbor 9, test and train accuracy are good. s chosing 9 nenghbor and retrain with new 9 value from CV best K vale came as 19 but i got better accuracy oat 9 from iterative looop so chosed 9 as KNN